Search CORE

24 research outputs found

An Optimal k Nearest Neighbours Ensemble for Classification Based on Extended Neighbourhood Rule with Features subspace

Author: Aldahmani Saeed
Ali Amjad
Hamraz Muhammad
Khan Dost Muhammad
Khan Zardad
Publication venue
Publication date: 21/11/2022
Field of study

To minimize the effect of outliers, kNN ensembles identify a set of closest observations to a new sample point to estimate its unknown class by using majority voting in the labels of the training instances in the neighbourhood. Ordinary kNN based procedures determine k closest training observations in the neighbourhood region (enclosed by a sphere) by using a distance formula. The k nearest neighbours procedure may not work in a situation where sample points in the test data follow the pattern of the nearest observations that lie on a certain path not contained in the given sphere of nearest neighbours. Furthermore, these methods combine hundreds of base kNN learners and many of them might have high classification errors thereby resulting in poor ensembles. To overcome these problems, an optimal extended neighbourhood rule based ensemble is proposed where the neighbours are determined in k steps. It starts from the first nearest sample point to the unseen observation. The second nearest data point is identified that is closest to the previously selected data point. This process is continued until the required number of the k observations are obtained. Each base model in the ensemble is constructed on a bootstrap sample in conjunction with a random subset of features. After building a sufficiently large number of base models, the optimal models are then selected based on their performance on out-of-bag (OOB) data.Comment: 12 page

arXiv.org e-Print Archive

Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for TumorC dataset.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for TumorC dataset.</p

FigShare

Brief description of the datasets along with the corresponding number of features, observations, class-wise distributions and sources.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Brief description of the datasets along with the corresponding number of features, observations, class-wise distributions and sources.</p

FigShare

Classification error rates produced by different methods on simulated data.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Classification error rates produced by different methods on simulated data.</p

FigShare

Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Colon dataset.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Colon dataset.</p

FigShare

Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Breastcancer dataset.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Breastcancer dataset.</p

FigShare

Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for DLBCL dataset.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for DLBCL dataset.</p

FigShare

Classification error rates produced by different methods on various subsets of genes.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Classification error rates produced by different methods on various subsets of genes.</p

FigShare

Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Lungcancer dataset.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Lungcancer dataset.</p

FigShare

Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Prostate dataset.

Author: Amjad Ali (51075)
Muhammad Hamraz (15353910)
Saeed Aldahmani (15353913)
Wali Khan Mashwani (9449980)
Zardad Khan (3368813)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/04/2023
Field of study

Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Prostate dataset.</p

FigShare